54 research outputs found

    Using Selection Pressure as an Asset to Develop Reusable, Adaptable Software Systems

    Get PDF
    The Goddard Earth Sciences Data and Information Services Center (GES DISC) at NASA has over the years developed and honed several reusable architectural components for supporting large-scale data centers with a large customer base. These include a processing system (S4PM) and an archive system (S4PA) based upon a workflow engine called the Simple Scalable Script based Science Processor (S4P) and an online data visualization and analysis system (Giovanni). These subsystems are currently reused internally in a variety of combinations to implement customized data management on behalf of instrument science teams and other science investigators. Some of these subsystems (S4P and S4PM) have also been reused by other data centers for operational science processing. Our experience has been that development and utilization of robust interoperable and reusable software systems can actually flourish in environments defined by heterogeneous commodity hardware systems the emphasis on value-added customer service and the continual goal for achieving higher cost efficiencies. The repeated internal reuse that is fostered by such an environment encourages and even forces changes to the software that make it more reusable and adaptable. Allowing and even encouraging such selective pressures to software development has been a key factor In the success of S4P and S4PM which are now available to the open source community under the NASA Open source Agreemen

    Introduction to Analysis Methods for Big Earth Data

    Get PDF
    Big Earth Data are too big to be tractable to simple data inspection and require models to make sense of all the data. Useful models for Big Earth Data may be physical, statistical, or machine learning based. In many cases, hybrid models combine attributes of two or more of these types

    Capturing, Harmonizing and Delivering Data and Quality Provenance

    Get PDF
    Satellite remote sensing data have proven to be vital for various scientific and applications needs. However, the usability of these data depends not only on the data values but also on the ability of data users to assess and understand the quality of these data for various applications and for comparison or inter-usage of data from different sensors and models. In this paper, we describe some aspects of capturing, harmonizing and delivering this information to users in the framework of distributed web-based data tools

    Future of Big Earth Data Analytics

    Get PDF
    The state of the art of Big Earth Data Analytics can be expected to evolve rapidly in the coming years. The forces driving evolution come from both growth in the data and advancement in the field of data analytics. In the data area, advances in sensor instrumentation and platform miniaturization are increasing both data resolution and coverage, resulting in enormous growth in data Volume. Increases in temporal resolution in particular also generate demands for higher data Velocity. At the same time, the proliferation of instruments and the platforms on which they reside is increasing the Variety of datasets. The Variety increase in turn leads to questions about the Veracity of the data. In the algorithm area, powerful machine learning methods are coming to the fore, particularly Deep Neural Networks. These are powerful at detecting interesting features in the data, integrating many different measurements (i.e., data fusion), and classification problems. However, they are still challenging when seeking explanations of how natural or socio-economic phenomena work using Earth Observations. Thus, classical analysis techniques will remain relevant when the emphasis is on forming or testing explanations, as well as to support interactive data exploration

    Taming Big Data Variety in the Earth Observing System Data and Information System

    Get PDF
    Although the volume of the remote sensing data managed by the Earth Observing System Data and Information System is formidable, an oft-overlooked challenge is the variety of data. The diversity in satellite instruments, science disciplines and user communities drives cost as much or more as the data volume. Several strategies are used to tame this variety: data allocation to distinct centers of expertise; a common metadata repository for discovery, data format standards and conventions; and services that further abstract the variations in data

    Proto-Examples of Data Access and Visualization Components of a Potential Cloud-Based GEOSS-AI System

    Get PDF
    Once a research or application problem has been identified, one logical next step is to search for available relevant data products. Thus, an early component of a potential GEOSS-AI system, in the continuum between observations and end point research, applications, and decision making, would be one that enables transparent data discovery and access by users. Such a component might be effected via the systems data agents. Presumably, some kind of data cataloging has already been implemented, e.g., in the GEOSS Common Infrastructure (GCI). Both the agents and cataloging could also leverage existing resources external to the system. The system would have some means to accept and integrate user-contributed agents. The need or desirability for some data format internal to the system should be evaluated. Another early component would be one that facilitates browsing visualization of the data, as well as some basic analyses.Three ongoing projects at the NASA Goddard Earth Sciences Data and Information Services Center (GES DISC) provide possible proto-examples of potential data access and visualization components of a cloud-based GEOSS-AI system. 1. Reorganizing data archived as time-step arrays to point-time series (data rods), as well as leveraging the NASA Simple Subset Wizard (SSW), to significantly increase the number of data products available, at multiple NASA data centers, for production as on-the-fly (virtual) data rods. SSWs data discovery is based on OpenSearch. Both pre-generated and virtual data rods are accessible via Web services. 2. Developing Web Feature Services to publish the metadata, and expose the locations, of pre-generated and virtual data rods in the GEOSS Portal and enable direct access of the data via Web services. SSW is also leveraged to increase the availability of both NASA and non-NASA data.3.Federating NASA Giovanni (Geospatial Interactive Online Visualization and Analysis Interface), for multi-sensor data exploration, that would allow each cooperating data center, currently the NASA Distributed Active Archive Centers (DAACs), to configure its own Giovanni deployment, while also allowing all the deployments to incorporate each others data. A federated Giovanni comprises Giovanni Virtual Machines, which can be run on local servers or in the cloud

    Introduction to Big Earth Data Applications

    Get PDF
    Climate and weather modeling generate enormous volumes that make iterative analysis challenging, spurring the development of new ways to work with the data. A theme going across applications is the need to identify and highlight "interesting" data for the scientist to focus on. Operational applications often scale up from small, local studies to larger spatial scales with more analysis targets

    Data Access Services that Make Remote Sensing Data Easier to Use

    Get PDF
    This slide presentation reviews some of the processes that NASA uses to make the remote sensing data easy to use over the World Wide Web. This work involves much research into data formats, geolocation structures and quality indicators, often to be followed by coding a preprocessing program. Only then are the data usable within the analysis tool of choice. The Goddard Earth Sciences Data and Information Services Center is deploying a variety of data access services that are designed to dramatically shorten the time consumed in the data preparation step. On-the-fly conversion to the standard network Common Data Form (netCDF) format with Climate-Forecast (CF) conventions imposes a standard coordinate system framework that makes data instantly readable through several tools, such as the Integrated Data Viewer, Gridded Analysis and Display System, Panoply and Ferret. A similar benefit is achieved by serving data through the Open Source Project for a Network Data Access Protocol (OPeNDAP), which also provides subsetting. The Data Quality Screening Service goes a step further in filtering out data points based on quality control flags, based on science team recommendations or user-specified criteria. Further still is the Giovanni online analysis system which goes beyond handling formatting and quality to provide visualization and basic statistics of the data. This general approach of automating the preparation steps has the important added benefit of enabling use of the data by non-human users (i.e., computer programs), which often make sub-optimal use of the available data due to the need to hard-code data preparation on the client side

    Analysis Ready Data in Analytics Optimized Data Stores for Analysis of Big Earth Data in the Cloud

    Get PDF
    Cloud computing offers the possibility of making the analysis of Big Data approachable for a wider community due to affordable access to computing power, an ecosystem of usable tools for parallel processing, and migration of many large datasets to archives in the cloud, allowing data-proximal computing. Generally, data analysis acceleration in the cloud comes from running multiple nodes in a split-combine-apply strategy. Data systems such as the Earth Observing System Data and Information System are in a position to "pre-split" the data by storing them in a data store that is optimized for data parallel computing, i.e., an Analytics-Optimized Data Store (AODS). A variety of approaches to AODS are possible, from highly scalable databases to scalable filesystems to data formats optimized for cloud access (e.g., zarr and cloud-optimized datasets), with the optimal choice dependent on both the types of analysis and the geospatial structure of the data. A key question is how much preprocessing of the data to do, both before splitting and as the first part of the apply step. Again, the geospatial structure of the data and the analysis type influence the decision, with the added complexity of the user type. Trans-disciplinary users who are not well-versed in the nuances of quality-filtering and georeferencing of remote sensing orbit/swath/scene data tend to ask for more highly processed data, relying on the data provider to make sensible decisions on preprocessing parameters. (This accounts for the popularity of "Level 3" gridded data, despite the lower spatial resolution it provides.) In this case, data can be preprocessed before the split, resulting in higher performance in the rest of the "apply" step, which can be transformative for use cases such as interactive data exploration at scale. Discipline researchers who are experienced with remote sensing data often prefer more flexibility in customizing the preprocessing data into Analysis Ready Data, resulting in more need for on-the-fly preprocessing
    corecore